268 research outputs found
DeepKey: Towards End-to-End Physical Key Replication From a Single Photograph
This paper describes DeepKey, an end-to-end deep neural architecture capable
of taking a digital RGB image of an 'everyday' scene containing a pin tumbler
key (e.g. lying on a table or carpet) and fully automatically inferring a
printable 3D key model. We report on the key detection performance and describe
how candidates can be transformed into physical prints. We show an example
opening a real-world lock. Our system is described in detail, providing a
breakdown of all components including key detection, pose normalisation,
bitting segmentation and 3D model inference. We provide an in-depth evaluation
and conclude by reflecting on limitations, applications, potential security
risks and societal impact. We contribute the DeepKey Datasets of 5, 300+ images
covering a few test keys with bounding boxes, pose and unaligned mask data.Comment: 14 pages, 12 figure
Depth Estimation Through a Generative Model of Light Field Synthesis
Light field photography captures rich structural information that may
facilitate a number of traditional image processing and computer vision tasks.
A crucial ingredient in such endeavors is accurate depth recovery. We present a
novel framework that allows the recovery of a high quality continuous depth map
from light field data. To this end we propose a generative model of a light
field that is fully parametrized by its corresponding depth map. The model
allows for the integration of powerful regularization techniques such as a
non-local means prior, facilitating accurate depth map estimation.Comment: German Conference on Pattern Recognition (GCPR) 201
Recommended from our members
Room reflections and constancy in speech-like sounds: within-band effects
The experiment asks whether constancy in hearing precedes or follows grouping. Listeners heard speech-like
sounds comprising 8 auditory-filter shaped noise-bands that had temporal envelopes corresponding to those
arising in these filters when a speech message is played. The „context‟ words in the message were “next you‟ll
get _to click on”, into which a “sir” or “stir” test word was inserted. These test words were from an 11-step
continuum that was formed by amplitude modulation. Listeners identified the test words appropriately and quite
consistently, even though they had the „robotic‟ quality typical of this type of 8-band speech. The speech-like
effects of these sounds appears to be a consequence of auditory grouping. Constancy was assessed by comparing
the influence of room reflections on the test word across conditions where the context had either the same level
of reflections, or where it had a much lower level. Constancy effects were obtained with these 8-band sounds,
but only in „matched‟ conditions, where the room reflections were in the same bands in both the context and the
test word. This was not the case in a comparison „mismatched‟ condition, and here, no constancy effects were
found. It would appear that this type of constancy in hearing precedes the across-channel grouping whose
effects are so apparent in these sounds. This result is discussed in terms of the ubiquity of grouping across
different levels of representation
Baseline and triangulation geometry in a standard plenoptic camera
In this paper, we demonstrate light field triangulation to determine depth distances and baselines in a plenoptic camera. The advancement of micro lenses and image sensors enabled plenoptic cameras to capture a scene from different viewpoints with sufficient spatial resolution. While object distances can be inferred from disparities in a stereo viewpoint pair using triangulation, this concept remains ambiguous when applied in case of plenoptic cameras. We present a geometrical light field model allowing the triangulation to be applied to a plenoptic camera in order to predict object distances or to specify baselines as desired. It is shown that distance estimates from our novel method match those of real objects placed in front of the camera. Additional benchmark tests with an optical design software further validate the model’s accuracy with deviations of less than 0:33 % for several main lens types and focus settings. A variety of applications in the automotive and robotics field can benefit from this estimation model
Cortical Maps
In this article, we review functional organization in sensory cortical regions-how the cortex represents the world. We consider four interrelated aspects of cortical organization: (1) the set of receptive fields of individual cortical sensory neurons, (2) how lateral interaction between cortical neurons reflects the similarity of their receptive fields, (3) the spatial distribution of receptive-field properties across the horizontal extent of the cortical tissue, and (4) how the spatial distributions of different receptive-field properties interact with one another. We show how these data are generally well explained by the theory of input-driven self-organization, with a family of computational models of cortical maps offering a parsimonious account for a wide range of map-related phenomena. We then discuss important challenges to this explanation, with respect to the maps present at birth, maps present under activity blockade, the limits of adult plasticity, and the lack of some maps in rodents. Because there is not at present another credible general theory for cortical map development, we conclude by proposing key experiments to help uncover other mechanisms that might also be operating during map development
Learning to Extract Motion from Videos in Convolutional Neural Networks
This paper shows how to extract dense optical flow from videos with a
convolutional neural network (CNN). The proposed model constitutes a potential
building block for deeper architectures to allow using motion without resorting
to an external algorithm, \eg for recognition in videos. We derive our network
architecture from signal processing principles to provide desired invariances
to image contrast, phase and texture. We constrain weights within the network
to enforce strict rotation invariance and substantially reduce the number of
parameters to learn. We demonstrate end-to-end training on only 8 sequences of
the Middlebury dataset, orders of magnitude less than competing CNN-based
motion estimation methods, and obtain comparable performance to classical
methods on the Middlebury benchmark. Importantly, our method outputs a
distributed representation of motion that allows representing multiple,
transparent motions, and dynamic textures. Our contributions on network design
and rotation invariance offer insights nonspecific to motion estimation
Feature pyramid transformer
Feature interactions across space and scales underpin modern visual
recognition systems because they introduce beneficial visual contexts.
Conventionally, spatial contexts are passively hidden in the CNN's increasing
receptive fields or actively encoded by non-local convolution. Yet, the
non-local spatial interactions are not across scales, and thus they fail to
capture the non-local contexts of objects (or parts) residing in different
scales. To this end, we propose a fully active feature interaction across both
space and scales, called Feature Pyramid Transformer (FPT). It transforms any
feature pyramid into another feature pyramid of the same size but with richer
contexts, by using three specially designed transformers in self-level,
top-down, and bottom-up interaction fashion. FPT serves as a generic visual
backbone with fair computational overhead. We conduct extensive experiments in
both instance-level (i.e., object detection and instance segmentation) and
pixel-level segmentation tasks, using various backbones and head networks, and
observe consistent improvement over all the baselines and the state-of-the-art
methods.Comment: Published at the European Conference on Computer Vision, 202
Accidental Pinhole and Pinspeck Cameras
We identify and study two types of “accidental” images that can be formed in scenes. The first is an accidental pinhole camera image. The second class of accidental images are “inverse” pinhole camera images, formed by subtracting an image with a small occluder present from a reference image without the occluder. Both types of accidental cameras happen in a variety of different situations. For example, an indoor scene illuminated by natural light, a street with a person walking under the shadow of a building, etc. The images produced by accidental cameras are often mistaken for shadows or interreflections. However, accidental images can reveal information about the scene outside the image, the lighting conditions, or the aperture by which light enters the scene.National Science Foundation (U.S.) (CAREER Award 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933)National Science Foundation (U.S.) (CGV 1111415)National Science Foundation (U.S.) (CGV 0964004
Vehicle Re-identification in Context
© 2019, Springer Nature Switzerland AG. Existing vehicle re-identification (re-id) evaluation benchmarks consider strongly artificial test scenarios by assuming the availability of high quality images and fine-grained appearance at an almost constant image scale, reminiscent to images required for Automatic Number Plate Recognition, e.g. VeRi-776. Such assumptions are often invalid in realistic vehicle re-id scenarios where arbitrarily changing image resolutions (scales) are the norm. This makes the existing vehicle re-id benchmarks limited for testing the true performance of a re-id method. In this work, we introduce a more realistic and challenging vehicle re-id benchmark, called Vehicle Re-Identification in Context (VRIC). In contrast to existing vehicle re-id datasets, VRIC is uniquely characterised by vehicle images subject to more realistic and unconstrained variations in resolution (scale), motion blur, illumination, occlusion, and viewpoint. It contains 60,430 images of 5,622 vehicle identities captured by 60 different cameras at heterogeneous road traffic scenes in both day-time and night-time. Given the nature of this new benchmark, we further investigate a multi-scale matching approach to vehicle re-id by learning more discriminative feature representations from multi-resolution images. Extensive evaluations show that the proposed multi-scale method outperforms the state-of-the-art vehicle re-id methods on three benchmark datasets: VehicleID, VeRi-776, and VRIC (Available at http://qmul-vric.github.io )
- …